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ABSTRACT: DNA profiles from muliiple-coninbuior samples 
are imcrprcicd by comparing the probabilities of ihe profiles under 
aliemaii vc propositions. The propositions may specify some known 
coniribuiors to the sample and may also specify a number of un- 
Known coniribuiors. The probability of the alleles carried by the set 
of people, known or unknown, depends on the allelic frequencies 
and also upon any relationships among the people. Membership of 
the same subpopulation implies a relationship from a shared evolu- 
tionary history, and this cficci ha.*; been incorporaied imo the prob- 
abilities. This acknowledgment of the effects of population struc- 
ture requires account to be taken of all people in a subpopulation 
who are typed, whether or noi they coniribuicd to the sample. 
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The inteipretalion of DNA profiles from more than one contrib- 
utor is one of the most chaliengin*;; tasks facing forensic scientists. 
Pan of the complexity is due to the very large number of combina- 
tions of genotypes that must be con-sidercd in some situations, ai- 
ihough a body of theory for a co))erei3i ircuimcni of nuxcd stain.s is 
now available (1-3). For a defendant who is not excluded i7oni a 
mixed stain this lheor>' avoids the potential prejudice that can fol- 
low from simplistic "random man not excluded" :ir[?.urncnis. 

in some cases, the typing lechnology may allow coiaplcxity to 
be avoided. When fragments are detected in ways thai allow scmi- 
quaniitaiion of the amount of DNA lor each allele u may be possi- 
ble to determine which alleles are from the same contributor. Ex- 
amples include fluorescently-labcled length variants delected by 
hiscrs, or silver slainini* to detect band intensiiy on ;i gel. There can 
Mil! be doubt, howevci, especially when different people contribute 
more or less equally lo the mixture and such problems increase 
with the number of contributors. As long as a quantitative assess- 
ment of ihc evidennary strength of DNA mixtures is required, we 
believe that there will be a need for analyses that consider all pos- 
sible sets of genotypes itiat would lead to the mixture profile. 

Out previous trcaimciil (3') assumed independence of all ilie al- 
leles in the mixed profile. This means independence within indi- 
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viduais. implying Hardy-Weinberg and linkage equilibrium, ?.s 
well as independence between individuals, meaning that the con- 
tributors are unrelated. Although these assumptions may be a:)e- 
quate in many situations, they ignore the low-level dependence 
among alleles within the same population due to evolutionary 
forces. Two people within the same population must have comrnor 
ancestors at some point in the past, the point being closer tor 
smaller populations, and this imposes a dependence between their 
alleles. A necessary corollary to this evolutionary relationship is 
the low degree of inbreeding among offspring of two parents from, 
the same population. It is this logic that leads to the necessity of 
working witli conditional profile probabilities rather than the pro- 
file probabilities themselves, and it is what led to Rccommendaiior 
4.2 of the second NRC report (2). Instead of determining the prob- 
ability of finding a profile in a random member of a population, i' 
IS necessary to determine the probability of finding the profile 
given tiiai the profile has been seen once already. Conditional prob- 
abilities take explicit account of allelic dependencies. 

In this paper we extend out previous treatment to allow for :hc 
dependencies among al) i)ic alleles carried by the contributors to 
tlic mixture. Initially we will assume that al! contributors belong to 
iht; same population, as this is likely to maximize the effects we aiC 
considering. We will also adopt the relatively simple formulation 
for the probabilities of sets of alleles advocated by Balding 'md 
Niciiols (4), Less restrictive treatments (5) would be unwieldy. m1- 
though wc do not expect the population struciure effects that we are 
con.sidering wjil be subsiaiuiai, we believe tluit iticy should be con- 
sidered fo) mixed DNA stams to the same extent that they arc con- 
sidered for single stains. 

Likelihood Ratios 

Likelihood ration h:ive been recognized by authors o^" several re- 
cent books as the appropriate way of interpreting evidence (6- J I}. 
At a trial there will be alternative hypotheses or propositions about 
who contributed to this evidence: the prosecution will have propo- 
sition and here we will suppose there is a single alternative 
proposition H,i. The likelihood ratio LR is 

Pr(Hvideticc | H^,] 
" Pr(Evidcncc | Kf) 

The DNA evidence E for mixed-stain cases is the set of alleles 
found among all the people who have either been typed directly or 
whose type is inferred because they are considered to have con- 
tributed to the stain. Previously (3) we took £ to mcrn only the al- 
leles in the stain, but the addition now of the alleles from people 
who may have been typed even though they arc hypothesized not 
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of population siruciure. 

We will make a distinction between the genetic profile, which is 
simply a iir.un^ of the distinct alleles in ihe mixture, and the statis- 
licai profile which is a list of all In atlcles when there are n con- 
tributors. These two profiles will be different whenever some con- 
tributors arc homozygous, or when some contributors share alleles. 
We will ignore the possibility of null alleles so that only homozy- 
gous individuals contribute a single allele to a genetic profile. 

We will use much of our previous notation (3), and repeat our 
observation that the interpretation of a mixed stain genetic profile 
requires a specification of the known contributors to the profile and 
of the number of unknown contributors. We will derive results for 
single loci and then multiply likelihood ratios over loci. 

As an example, suppose ihe evidentiary sample in a single-per- 
petrator rape case shows three alleles a, b. c at some locus. The 
sample was recovered from the victim's person, she was found to 
be of type ab and a suspect was found to be of type c. The prose- 
cution proposition is likely to be H^: "The victim and the suspect 
were the only contributors to the sample/* and a likely alternative 
proposition is H^: "The victim and some unknown man were the 
only contributors to the sample." The usual solution (2,3) for this 
situation is 

1 

^ pA2pa ^ 2p, + Pr) 

where the p's are the allele frequencies. We now derive this result 
from the perspective of this paper, first witli population structure 
ignored. 

Under proposition H,, only the victim and suspect arc involved 
and they have both been typed. The DNA evidence is therefore the 
genotype pair {ab, cc). We write the probability of this pair as 
Pr(ab. cc) - 2 Pr(abcc). The approach wc arc taking assigns prob- 
abilities to sets of alleles without regard to the arrangement of alle- 
les among individuals, but we do need a factor of "2" for tlie het- 
erozygous victim. Had the victim been ah and the suspect be we 
would have required the probabilify ^ ?r(ahbc) since there are then 
two licicrozy^oies. Wlicn popuUnioii stnicturc i.^; i(.';norcd, as it was 
previously (3), the probability of a sci of alleles is just the product 
of frequencies of the separate :il)cicr.. so ?r{ahcc) ~ pnPhpl- The 
numerator of LR is, therefore. 

Pr(£ I = 2p..p^,p: (2) 

Note thai, becat-- ^ the victim and suspect are both known individ- 
uals, there is no n^cd to consider the 2! orders of these two people 
as was erroneously done in in the first printing of (7). 

Under proposition H,} there arc three people to consider: the sus- 
pect of i!enotype cc who did not contribute to the sample, and the 
victim of type ah plus the perpetrator of unknown genotype who 
both did coniribute to the sample. Examination of the profiles of 
the sample and ihc victim shows thai the unknown man must have 
allele r ami may also have alleles cl h or c There are a total of six 
alleles in E, and the probability is Priah. cc. ac) + Priah. rc. he) 
Pr(ah. rr. rr) or 4 Pr(aahccc) -r 4 Vr{tihbrcc) + 2 Pr(ahrrrc). The 
denominator of LR is 

Pr(E I H,,) - 4p:,p,p!. + 4py7,p^. -r Ip^.p^.pt (3) 

The factors of 2 or 4 are because of the one or two hetcrozygotes. 
DiN'iding Eq 2 by Eq 3 leads to the previously known result given 
in Eq I. 

[i Will t>c helpful to modify (his example before proceeding Uu- 



now the sample is not from the victim's person (e.g., it may oe from 
discarded clothing) and the ahemative to H^, is specified as H^: 
'Two unknown people were the contributors to the saiTiple.'* Under 
Ibis proposition, there are four people involved: the victim and sus- 
pect, neither of whom contributed to the sannplc, and two unknown 
people who were the contributors. These last two people must have 
alleles ahc between them but cannot have any other alleles. The 
possible combinations of genotypes for the unknown people are 
[an, be), {ab, ac), {ab, be), {ab, cc), {bb, ac), {ac, ab), {ac, bb), {ac, 
be), {be. aa), {be, ab), {be, ac), and {cc, ab). These 12 combinations 
represent three distinct sets of alleles: aabc. abbe, abcc, and each 
set has a coefficient of 1 2 which is the number of ways of arrang- 
ing the four alleles into two different genotypes. The coefficient in- 
cludes the cffecis of the two orders of alleles vvithin heterozygoies 
as well as the two orders of different genotypes such as aa, be and 
be, aa. The probabilities of all eight alleles among the four people 
involved are obtained by multiplying the probabilities 12 Pv{aabc), 
12 Pr{abbc), 12 ?t{abcc} by the probability Priab, cc) = 2 
Pr(t3bcc) of the victim and suspect, and can be written as 24 
Pr(aaabbccc) 4- 24 Pr{aabbbccc) + 24 ?T{aabbcccc) so that 

Pr(£ I H,) = 2Aplplpl + 24plplpl -i- 2Aplplpl (4) 
Dividing Eq 2 by Eq 4 gives the LR for this situation as 

Pr(£ I H,) ^2p,PuPAPu + Pz, + Pc) 

as has been given before (3). 

We now modify the solutions in Eqs 1 and 5 to accommodate the 
situation where aJ] people, the victim, the suspect and (under H,{) 
the unknown person(s), belong to the same subpopulation. Proba^ 
biliiies for the genotype(s) of the unknown pcrson(s) must take into 
account the knowledge Dim two people in this subpopulation havr 
been found to have genotypes ab and cc. 

For both scenarios, /-/^ is that only die victim and suspect were 
the contributors to the sample. We will show that the required icrm 
F'r(«/;rr) is given by 

(i ~ 0)(i)(] i 0X1 -f- 20) 

where 0 is the coancestry coefncicnt in the subpopulation to which 
the victim and suspect both belong. 

For the denominator in the first scenario, which is that the victim 
and an unknown person contributed to the sample but the suspect 
did not, there are three people and six alleles to consider. We will 
sliow, foi example, that 

Px{aabccc) 

Id - 0)P.1[(1 - 0)p, ■\~ 8„|l(l - OKI 

^ ^ 1(1 - 0)p.l[(i - 0)p, + e|[f t - oy;. 4 201 

(I - OK DO +0)0 ^ 2\))(\ ^ 30)( I H 40) 

where 0 is the coancestry coefficient lor the subpopulation t'^ 
which ail three people belong. These expressions lead to 

__ f I + 3QKt + 40) 

~ Ul - 0)/;, + 2fil|(l - e)(2/)., 4- 2pf, + p,) -i- 701 

which reduces correctly to Eq i wfien 0 = 0. 

For the second scenario, where both coniribuiors to the sample 



we will show that 
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where 

X. = [(I - OKlId - G)p, + G][(l - e)p. + 2G] 

^ (1 - e)(i)(i + exi + 2e)d 30)d ^- 4e)(i -f 50) 

X (1 4- 66) 

so that LR becomes 

(1 + 36K1 +4eKl + Sm + 66) - 
12[(1 - GK + G][(l - 6)p, + e][(l - GK + 26] ^ ^ 
XKi - ^)iPa Pt> + P<) + 76] 

and this reduces to Eq 5 when 6 = 0. 

What is the numerical effect of usins Eq 6 instead ofEq P When 
allele frequencies arc all relatively small at 0.1 and G has the rela- 
tively high value of 0.03, the LR drops from 20 to 12.33. .Muhiply- 
ing values from Eq 6 over several loci can give quite large LR val- 
ues, but ihey will be less than those from Eq 1 in which population 
structure is ignored. 

The approach we have just illustrated is as follows. AUemaiive 
propositions arc needed t^iat specify the numbers of contributors to 
the evidentiary sample. Some of these contributors will be known 
and typed people, and sor^^e will be unknown people. Those con- 
tributors, together with an/ typed people who are known (under the 
proposition) not to be contributors, contain among them a set of al- 
leles whose probability can be wniicn down as the product of the 
separate allele proportions or as a more complicated function that 
incorporates the population structure parameter G. There is also a 
factor of 2 for each known heierozygote, and a tcnn for Uie number 
of ways of arranging all 2x alleles from .r ui\knowm people into 
pairs. There may be different seis of alleles from unknown people 
under some propositions, and the probabilities for these sets must 
be added (ogeihcr. The likelihood ratio is tlic ratio of probabilities 
under alternative propositions. As oddiiionai examples, wc list the 
results for each of the common cases described in (7) in the Ap- 
pendix. 

Although it is possible lo follow the above line of argunieni for 
any situation, v/c nrefer to work with a general apj^roach amenable 
to automatic (computer-based) calculation as we did previously 
(3). This wilt relieve the forensic scientist of the need for lengthy 
calculations in the same way that computer programs such as POP- 
STATS can be used for other DNA calculations. Wc will lay out 
the logic behind this general approach even though we anticipate 
the routine use of computer packages. 

In order to do this we need to break the problem into two parts; 
we iist the alleles, with their multiplicities, carried by the unknown 
contributors under H,, or /■/,/. and then we determine the probabili- 
ties of the allele sets. The two probubilities lead to the likelihood 
ratio, h is our use of the theory in (4) that allows us to concentrate 
on alleles rather than genotypes. 

Notation 

Much of the complexity in dealing with mixtures can be re- 
moved l>y a mnemonic notation, as laid out in Table 1. We find it 
ver\' helpful to label the alleles at a locus A by the letters /\,. There 
arc sets of alleles (not necessarily distinct — the statistical profiles) 



Alleles in the profile of ihc evidence sample. 
C The SCI of alleles in the evidence profile. 

The set of distinct alleles in the evidence profile. 
The known number of coniribuiors to C. 
The unknown number of heterozygous coniribuiors. 
The known number of distinct alleles in Q. 
The unknown number of copies of allele A, in C. 
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Alleles from typed people thai H declares to be coniribuiors. 

T The set of alleles carried by the declared coniribuiors to C. 

The set of distinct alleles carried by the declared contributors. 
ti f The known number of declared contributors to C. 
/(J- The known number of heterozygous declared cofttribuiors. 
; The known number of distinct alleles in T^. carried by /ir declared 
contributors. 
The known number of copies of allele A; in T. 



Alleles from unknown people that H declares to be contributors. 
U The sets of alleles carried by the unknown contributors to C. 
.V The specified number of unknown contributors to C: nc ~ »r 
r — r The known number of alleles thai are required to be in U. 
r The known number of alleles in U thai can be any allele in C^.. 

r = Zx - {c - I). 
The number of different scis of I'llcies U,ii, - {c + r - I)!/ 

[(c-l)!r!]. 

/ , The unknown number of copies of A, among the r unconstrainer' 
alleles in U. 

u, The unknown number of copies ofV\/ in U: Ci = /, + u;. 

If A, is in Cjp but not in r^: - r, -r 1. If A, is in Q and also in T,.: 

u, ~ r,. 

Alleles from typed people thai H declares to he non-coniribuiors. 
V The set of allcks carried by tyjK-d people declared not to he 
contributors to C. 

Hi' The known number of pcoplt dcclaicd not to he contributors in C. 
hy The known number of heterozygous declared non-coniribuiors. 
V, The known number of copies of A,- in V: i;, - In v. 



that occur in the crime sample (C). For a panicular proposition 
there are al Icier, (T) carried by typed people declared to be con- 
tributors and alleles {U) carried by unknown comrihutors to the 
sample, and there arc alleles (V ) carried by any people declared not 
to have contributed to the sample. There are correspond iiu^ scis of 
distinct alleles— the genetic profiles — and these sets arc indicated 
by a ^ subscript. Note that the same person may be declared to be 
a contributor lo the sample under one proposition, and declared not 
to be contributor under another proposition. 

Allele Sets 

The alleles in the evidence profile are carried by typed people 
declared to be contributors or unknown people, so that C is the 
comoiiKuion f union) of .scis 7' and U. For a given proposition ^ the 
probability of the evidence protlle depends also on il^.e alleles cai- 
ricd by people who have been typed but are declared by that propo- 
sition not to have contributed to the profile. For a proposition in 
which there are x unknown contributors, we write the probability as 
PAT, U, V) in an extension of our previous notation (3). Note, 
however, that the present probability is for all the alleles in the sets 
T. U. V whereas the probability in (3) was for only the alleles in U 
conditional on those in T. In the total set of 2nc + ^/jv = 2/^/- 
2na 4- 2uv alleles, wc see from Table I that allele A, occurs c, + 



V, — /, -r- u, -i V, Urncs, Wc auo uic pjoiuH>i.ii.',;> uvv.i .n. j^... 

n, ~ (r + / - I )!/[(c - ] )!/ !) distinct sets of jf,. As listed in Table 
l^cis the number of distinct alleles in and r is the number of al- 
Jctes carried by unknown people thai can be any one of these r al- 
leles. 

Generating the iir sets U is a two-stage process. Some of the al- 
leles in each set must be present: these are the alleles in the set Q 
that arc not in set T^,. Other alleles are not under this constraint be- 
cause they already occur in 7„, and there are r, copies of A,- alleles 
in this unconstrained set. It is a straightforward computing task to 
let r, range over the integers 0. L . . r. then let rnnge over the 

integers 0, 1 /• - r,, then let range over the integers 0, 1 

r — ri — r^, and so on. The final count r^. is obtained by subtract- 
ing the sum of r,, ri , from r. The total number of /I. alle- 
les in set U is Z;= i n; = Zx where itj = r, for those alleles in both 
Cf. and r^., and = r,- + 1 for alleles in Q but not in 7^,. 

For any ordering of the 2.r = u; alleles in U. successive pan s 
of alleles can be taken lo represent genotypes and there are 
(2.v)!/(n;=t ^,!) possible orderings. This is the number of possible 
sets of unknown genorypes that have each allelic set U. Although it 
is the genotypes thai correspond to the x unknown people, it is the 
set of Zx alleles that we use to determine the probability, in combi- 
nation with the Ifir + 2nv alleles among the known people. Be- 
cause the /17 typed people all have specified genotypes, we consider 
not all possible orderings of the 2/ir alleles but just a factor of 2 for 
each heterozygote. Similarly, wc need a factor of 2 for each het- 
erozygote amon^ the set of riy non-contributors {this corrects erro- 
neous stalemenis in (7)). 

For the single-perpetrator rape example above, now writing alle- 
les a, c as A|, A2,/\3, the evidence sample set is C- - (Ai, ^3) 
and c = 3. Under Hfj (the victim and one unknown person con- 
tributed to the mixed stain) the set from known people \sT — {A^, 
^2) and ///^ = 1, / 2. The .set from the unknown person must con- 
tain A-i since c - f = I , -v = r - 1 , but can also contain any of 
the three alleles in set C..: i.e. ihcrc arc n, - 3 different sets of al- 
leles from the unknown person. Wc also considered the situation 
where is that the evidence stain was from two unknown people, 
A ~ 2 and no known coniributors, n-/- - Ntuv U musi coniain 

all three alleles A 1, /\2, A3, c ~ ; = 3, and the /• - I other allele can 
be any of these tlu-ce. There are n, - 3 different sets U. The counts 
of alleles A,, An. A3 in these sets arc, therefore. (2,1,1), CI>2J), 
(1,1.2) and each of these can be ordered in 4f/(2! I f i !) = i2 ways. 

Allele Dependencies 

Wc now consider how 10 atiacii probabilities to ihc sets of a 
lleies discussed in the last section. We suppose that a state of 
evoluiionary cquilibriuni has t^cen csiahlishcd. so thai the proba- 
bilities of sets of alleles can be found from the Dirichlet distribu- 
tion (13). This distribution depends on allele proportions and the 
coanceslry coelYicieni. The statement thai the relationship between 
pairs of alleles in a subpopuiation can be quantified by ihe coances- 
lry coefficient 0 has scver.'il inrcrprctadons {12), hicrc wc wi|] t;]ke 
11 to uicaii (hat the probabiltiy Uiui two alleles lakcn ni random Iroiii 
the suhpopulaiion arc both of lype A, is pj + i\p,i 1 - ). where p, 
is the allele frequency of A, (iverat;c(! over sithpopuhnions. When 
allele frcijuencies over populations follow ihc Dirichlet distribu- 
tion, the probability of a set of frequencies [p,] for alleles /\, is 
given by 



li ^ {\ - 0)p,/0, 7. = ^ ^. = (I - e)/G 

and r is the gamma function with the property + 1) = jcRv). 
The great advantage of this Dirichlet distribution is that it allows 
ihe probability of any set of alleles to be found ver)' simply. If the 
set has /uj copies of A;, then the probability is 
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(n ^t) 



r(m. + T)tl n-y,) 



where ni. = Z, m,. This is the result upon which Eqs 4.10 in the 
1996 NRC repon (2) are based (4). 

In our mixed-stain situation, there are /, + «, + u, copies of al- 
lele ,4,-, and the required probability i.s 



P,(T. V. V) = X 



(2v)!2'""^^"' 



rc-Y-) 



^(•V. + ^■ lij-^ v i) 

r(7.) 



(9) 



Summing over the (r, ) values accounts for all sets U, 

Although this is a very compact expression, implementing it in a 
computer program is easier after some expansion. From the prop- 
erties of the gamma function Y{-) and the definition of 7, 

It 
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r{7. + 2. 4- 2n^ + In,.) " O;:? [(1 - G) 4- JO] 

r(7, '1- ^ + i^, -i- V,) _ ni-V-^"'"' 1(1 - 0)/^; 4- >Q] 

r(7; ) ~ 

We can also make the summation over { r,) values more explicit 
by showing the range of values of each r- Equation 9 becomes 



f\{r. U, V) - X I 



z 



(10) 



Likelihood ratios are formed as the ratios of two such probabilities, 
and wc note that people declared to be contributors undci one 
proposition may be decJared to he non-contributors under the other. 
In other words, every person typed is declared to be either a con- 
irlhuior or a non-contribuior. The number of people typed, and the 
alleles they curry among ihcm. arc the same for every proposition. 
For this reason, itr + itv. hy + hv and + v, will be the same in 
die probabilities for each proposition. The term 2^'' ' will caricel 
()u( of the likelihood rano. as will some of the lernn in ihc product.; 
in the numerator and dcjiominaior of the nghi hand side of 15q 1 0. 

If population siniciure is ignored, and 0 is set to zero. Eq 10 re- 
duces to 



= 1 I- I 



This is equivalent lo Eq 5 in our earlier ireaimcnl (3) and may be in 
a form more convenient for compulation. Because of cancelation of 
terms in the likelihood ratio, it can be seen thai nj, ny, hjy hv. ^, 
arc noi used when G = 0. In this case the value of LR depends only 
on the numbers and frequencies of the alleles carried by unknown 
contributors. There is no need to consider the genotypes of typed 
people, whether or not they contribute to the evidence sample. T his 
is different to the situation where population structure is taken into 
account — then the genotypes of all typed people are needed. 

In the degenerate case where there are no typed people, contrib- 
utors or non-contributors, = t/, = 0, then = r, and the sum for 
6-0 is just 2 multinomial expansion: 



Examples 

We now consider aYi example where the evidence sample Cg = 
(AiA^AjAj) {c = 4) is known to be from two perpetrators but only 
one suspect, of type A 1^2, has been apprehended. Proposition Hp is 
that tins suspect and one unknown person were the contributors, so 
7 = (A jAt) {nj = 1 , / = 2) and t/ has only one possibility (/i , = ] ); 
the two alleles A3A There are no known non-contributors, soV = 
4),n^~ 0 where ([> denotes the empty set. The probability under 
is 



/'■((AjA.l, (A.-,A4|, {ct>l) 



n 



n-y. + 1) 



IM! r(7- +4) i\ r(7.) 

^ 4(1 - 0)Vi/?7P>P4 

(] -i- 0)(1 + 20) 

Proposition Htj is that there are no known contributors. T = cjj, 
tir = 0, there is one person ki]Own not be u contributor, V ~ (A1A7), 
/Jv - 1, and there are two unknown contributors v/lio must cany al! 
lour alleles between them. Once again, ifiere is only one possible 
SCI U = (A^AjA^A^), n, = I and the probability is 



P.(1(J>1, {AiA.A^A^l. (A,A,1) 

412' r(7.) Y] n ^^^'^ 



(t 0)(J + 29){) + 3e)(l + 40) 
The likcliliood ratio for this example is, therefore, 



LR = 



(I -t- 30)(] + 40) 



12|(1 - 0)p, + (i}t(j - Bjp. -i- Oj 



\vl\ich reduces to l/(I2/;tp2) when G - 0 as has been given [)revi- 
ously (1,3). 

A more complicated example is for a rape committed by three 
Mien. Suppose that the evidence sample has alleles (A,, A 2, A 3, Aj), 
V victim is of type A (A ^ and a single suspect has type A ^A 3. Then 
I'^vo aliemaiivc propositions arc; /i,,: "The victim, the suspect and 

0 unknown men contributed to tlie sample/* and "The victim 
and three unknown men coniribuied to the sample." 

The evidence genetic profile has c = 4 alleles C^. = (A,, A^. Ai, 



A4). Under proposition H^, there arc / == 3 disiinct alleles Tg ^ (Aj, 
A2, A3) from two known contributors and no alleles from people 
known not to be contributors, V = <Jj. For x ~ 1 unknown contrib- 
utors, the number of sets of r = 3 alleles these people can carry in 
addition to the A 4 allele they must have among them is ~ 
6!/(3!3!) ^ 20. The counts//,, w:, W4 for all four alleles A ^ A^, 
A ^. A 4 among t!ic two unknown men, togcthci with the multiplici- 
ties (4!2')/[io!/^2! f/s!^'^!), are 

0,0,0,4:2 0,0,1,3:8 0,0,2,2:12 0.0,3,1:8 0.1.0,3:8 

0.1.1.2:24 0,1,2,1:24 0,2,0,2:12 0.2,1,1:24 0,3,0,1:8 

1.0.0,3:8 1,0,1,2:24 1,0,2,1:24 1.1,0,2:24 1,1,1,1 48 

1,2.0,1:24 2,0,0,2:12 2,0,1,1:24 2.1,0,1:24 3,0.0,1:8 

Under proposition there are / = 2 alleles, T = (A,, A:), from 
a known contributor (the victini) and two alleles V = A3, A3 from a 
person (the suspect) kiiown not to be a contributor. For jr = 3 un- 
known contributors, the number of sets of r = 4 alleles these peo- 
ple can carry in addition to the A3, A4 alleles they must have among 
them is ny = 7'./(4!3!) = 35. The counts »,.W2, W3, forAj. A2.A3, 
A4, with coefficients [6l2')/(ii,!w;!ij3!w4!), for the 35 possible sets 
are: 

0,0,1.5:12 0,0,2,4:30 0.0,3.3:40 0.0,4,2:30 0,0.5, 1:1:^ 
0,1,1,4:60 0,1,2,3:120 0,1.3,2:120 0,1.4,1:60 0.2.1.3:120 
0,2,2.2:180 0,2,3.1:120 0,3.1,2:120 0,3.2,1:121; 0.4.1,1:60 
1.0,1,4:60 1.0,2.3:120 1.0.3,2:120 1,0,4,1:60 1,1.1,3.240 
1,1.2,2:360 1.1,3.1:240 ).2,1,2:360 1,2.2.1:360 1.3,1.1:240 
2.0,1,3:120 2,0,2.2:180 2.0.3,1:120 2.1.1,2:360 2,1,2,1:360 
2.2. 1,1:360 3,0.1.2:120 3,0.2,1:120 3,1,1.1:240 4,0,1,1:60 

For each proposition, the multipiicttJcs are multiplied by the i^p- 
propriaic Diriehlci probabilities and the 20 or 35 terms added to- 
«clher. Obviously this is a task belter suited for a computer. 

Multiple Subpopulatioas 

So far wc have considered the situation where ail people, in- 
volved in the evidence interpretation have been in the same sub- 
populaiioti. Other situations arc likely, especially when viciirr. and 
suspect bcioni: to different racial groups. The same sets of alleles 
arc involved as before, but now the probabilities need to be calcu- 
lated separately for the alleles within each subpopulation. 

We begin by returning to our first example of a singlc-perpetra- 
lor rape where the victim was of type A 1 , A the suspect was of type 
A3A:, and the evidence sample was AjA^Aj. If there was reason to 
believe that the perpetrator was of the same racial type as the sus- 
pect, but of a different type from the victim, then the victim's alle- 
les need to be separated from those of the suspect and. under //,/, 
from the unknown perpetrator. Suppose that the victim belonged to 
racial group 1, with coancesiry 0, for her subpopulation and allele 
frequencies pi,/?:, for A|, A2. Suppose also that the suspect and per- 
petrator belong to racial group 2, with coancestry coefHcient 0: for 
their subpopulation and allele frequencies r/i. ^/_.. c/:, for alleles A |. 
A:. Ay Suppose, further, that there is zero coancestry between alle- 
les in different racial groups so that alleles in groups 1 and 2 can be 
treated independently. 



Uliuci Zip, ilic ptuuauiiity 1^ 

Fa({AxA2A,A^\. (4)1. 14>1) 

= 2(1 - (^i)piP- X r/3((l - + O.j 

since ihc pair A jA . from group 1 and the pair A yA y from group 2 
are treated separately. Under Hj, one of the three components of 

1, lA^Ay], \Ay.Ay]) = 2(1 - e,)p,P2 

2(1 - e^V/^^y^Kl " + e^lKt - Q2)^.> + 2B:1 

^ (1 + e.Xi + 20.) 

since the pair y4 from group 1 and the two pairs AiAj^AyA-^ from 
group 2 are treated separately. Equation 6 is replaced by 

. (1 -i- 62X1 + 262) 

" Ul - + 2e.J[(l - 92X2^1 + 2r/: + ^3) + 362] 

The general Eq 10 can be modified 10 allow for different sub- 
populations. However, when any of the three sets T, U, V contains 
alleles from different subpopulations, as was the case in the exam- 
ple just considered, it will be necessary to introduce further nota- 
tion. Each of the counts v. would need to be split into a com- 
ponent for each subpopulation, aud the multiplicity coefficients 
would also need to be derived separately for each subpopulation. 

Discussion 

Wc offer this treatment of the effects of population structure on 
DNA mixture calculations to complement two previous treat- 
ments — the effects of population structure on single stains (2,4) 
and the interpretation of mixed stains without population structure 
(K3). Our study therefore closes a gap in cunent DNA forensic in- 
icrprctaiion. 

Our ireaimcni is based fnmly on the use of likelihood ratios and 
the accompanying need for condiliona! probabilities. There is 
alternative when the evidence is less than ccnair. under the propo- 
sition H^,. Conditional probabilities arc necessar/ to incorporate ilic 
known genetic nature of DNA profiles. The full meaning of pro 
files cannoi be found without accounting for the role of evolution 
in shapinr. tlic probabiiiiies of scis c: luofiles. The novel feature of 
this study lies in accounting for the information contained in the 
profiles of people who are declared not to have contributed to the 
evidence profile. This has arisen for the situation of a suspect, who 
is not excluded from the evidence protlle, being declared noi to be 
a contributor under proposition H,i. 

The arguments made for incorporating non-contributors can be 
extended. Several people may be typed during the course of an in- 
vestigation. Even if ihcy arc excluded as being contributors, they 
provide information for the probability calculations when they can 
he ct>nMticrcd to b-long to ilic same sul^popuhiiion ;is (sonic of) 
people not excluded. They make ihcir coniribuiion to the calcula- 
tion vi:i allelic set V. 

Our treatment has assumed a specific number of unknown con- 
tributors, but we realize that this number is very likely not to be 
known. Although some general staienicnts about conservative as- 
sumptions can be made (3). such as assuming large numbers of un- 
known people for loci with few alleles and small numbers of un- 
known people for loci with many alleles, we prefer not to formulate 
rules. Insteud we recommend the calculation of likelihood ratios 



conservative results. 

We have not allowed for unseen, or '*nul!,*' alleles as hzs been 
done previously (2.3) because the move, away irom RFLP t jchnc!- 
ogy in forensic science has diminished the need for such a treat- 
ment. We have not considered other typing-system features such as 
intensity or peak height differences as these have been discussed 
elsewhere. However, wc do consider tliai the approacii rtcscrbcd 
here is sufficiently flexible to allow the interpretation of man^' dif- 
ferent mixed-stain DNA profiles. 

Software for conducting the calculations described in this paper 
can be obtained directly from the World Wide Web page 
wwv^.stal.ncsu.edu (click on "Statistical Genetics") or by sending 
email to weir@stat-ncsu.edu. 
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APPENDIX 



hi this Appendix we show ihe effccis of popul:>tion stru'Ziur': lor 
each of the six common siiu;itions described in Chapter 7 of (7). A 



case scuing B = 0 reduces the result to tfie one given in (7). 

Case J: Four-AHcle Mixture, Heterozygous Victim, and 
Heierozyj^ous Suspccl 

The viciim is of type A^^Aa. the suspect is of type AxA^. and the 
crime sample of type A\A2A^A4. The two proposiiions are 

Hf,: The victim and the suspect contributed to the stain. 

H,f\ The victim and an unknown person conu-ibuted to the stain. 

The evidence sample is C = Q - {A\A2AyA^) and c = 4. 

Under Hp, the alleles from known contributors are T = - 
A (/InAjAj and txj - 2. hr - 2, / = 4. There are no alleles from un- 
known contributors or from people declared not to be contributors, 
so nv = hy = 0. 

Under H,{, ihe alleles from known contributors are T ~ T„ 
A:A^ and nr ~ 1. /ir = 1, / = 2. The alleles from unki-^own con- 



Case 3 



Case G 



Victim Suspect Sample 
Case 1 

Victim Suspect Sample 



A3 



Victim Suspect Sample 



Caue 2 



Suspect Sample 



Case 4 



Suspect Sample Suspect I Suspect 2 Sample 
Alt r All 1 Ai\ 1 



A^ 



Case 6 



ielcs from people declared not lo be contributors are K - /i \A2 ixua 
nv- U hy = I. 
The required probabilities are 

Hp:Fo{{A^A2AiA,].^A^) 

{] - e)(i + o)(i + 20) 

H,f.P,{[A^A^]. [A^A^\, (/t.Mjl) 

^ 2-2!(t - 6)Vip.j?3/74[(l - e)p, -r e][(l - %)p2 + GJ 

i!i!(i - 0)ci + e)(i -h 20)(i + 38X1 + m 



and the likelihood ratio is 



LR = 



(1 + 38)(I + 46) 



2((1 - 6)pi + 6][(1 - 9)/?: + 6] 



Case 2: Thrce-AUele Mixture, Homozygous Victim, and 
Heterozygous Suspect 

The victim is of type the suspect is of type /jA^, and the 
crime sample of type The two propositions are 

Hp. The victim and the suspect contributed 10 the siain, 

Hti' The victim and an unknown person contributed to the siain. 

The evidence sample is C = (AjA^^sAj), so C. - (AiA->A O and 
c = 3. 

Under the alleles from known contributors arc - A^AiA^ 
and «r = 2, ^?7- = U f - 3. There are no alleles from unknown con- 
tributors or from people declared not to be contributors, so hx' 
= 0, 

Under Z/^, the allele from known contributors is T^. = A.t and rir 
= \, hr ~ 0, 1 ~ 1. The alleles from the unknown contributor are 
constrained to include A 1A2, and;c = 1, r = O.Thc alleles from ihc 
person declared not to be a contributor are V - A, A 2 ^»d ny - K 

The required probabilities arc 



H,:Foi{AiA,A,A,},<\.. i\>) 



(1 - 0)(1 + G)C1 4- 26) 



Hj: P]{|A3A:,1, (A.A.J. {A.A.H 

2*2!(l - e)>,/;:/)3((i ~ 0)/., ^ 0) 

^ X [(1 - Q)p. -T 0][(1 - 0);m + 0] 

1!1!(1 - e)(l + e)(l -r 20X1 + 30)(1 -f 46) 



and the likelihood ratio is 



LR - 



f! + 3GK1 -r 40) 



21(1 - 6)/;, + 0]|(! - 0)/M 1' 01 



FIG. t 



as it was for Case 1 . 

Case 3: Three-AUclc Mixture, Heterozygous Victim, and 
Homozygous Suspecl 

The victim is of type A2A3. the suspeci is of type Aj, and ihe 
crime sample of type A iA:A3. The two propositions are 



/y^: The victim and the suspeci contributed to the stam. 

H^: The victim and an unknown person contributed to the siain. 

The evidence sample is C = (AfAiA-^A:,), so Q = (AiA:>A^) and r 

- 3- 

Under H^, the alleles from known conlributors are 7^ - ^i/\:/^3 
and /77 = 2, hf=^ 1 , r = 3. There are no alleles from unknown con- 
tributors or from people declared not to be contributors, son^^- fw 
= 0. 

Under H^, the alleles from known contributors are 7, = ^^2^3 and 
^7 = \,hr= i,t -2. The alleles from the unknown contributor arc 
constrained to include A 1, and ,r = I, r = 1. The unknown contrib- 
utor may also carry alleles ^4 1, ;4 2 or Ay. The alleles from the person 
declared not to be a contributor ^rt V - A], so n\/ = Uhv~0. 

The required probabilities are 

2'(1 - e)V,p./?3[(l - 6);;, + 6] 
/'oC(/\,/\,/\./\3).4>,4>) - ^1 _ exi + 0)(1 + 26) 

2'2!(i - 6)V./>:P^[(i -e)p,i + e] 
^ xi(i -o)p, + 2e][(i -e)p, + 36] 

2!(1 - 0)(1 + eXl + 20)(1 -f 36X1 + 46) 
2'2!(i - 0)Vi/>2/>?[(i - 0)/?, -h 0) 

J!1!(I - 0X1 + OKI 20X1 + 36X1 + 40} 
2'2!(1 - B)';up2P,\0 - + Oj 

X 1(1 - 0)p, + 2e][(i - e)p3 0] 

!!]!() - 0X1 + 0X1 + 20X1 + 30X1 + 40) 
and the likelihood ratio is 
LR = 



(1 + 30X1 + 40) 



1(1 - G)/>i + 2l)]l(1 - OX/^i + 2p2 + 2p3) + 701 



Case 4: Four-A!lcle Mixture, Heterozygous Suspeci. 
and One Unkiu^wn 

The suspect is of typt* A^Aj, iiiid ihe crime sample of type 
The (WO propositions arc 

Hf,: The suspeci and an unknown person contributed to 
the stain. 

H^f'. Two unknown people contributed to ttie stain. 

The evidence sample is C Q = (AjyA^A^xAj) and c = 4. 

Under H.„ the alleles from known contributors are 7=7*^,= 
/\|/\; and n , = 1. ~ \, i - 2. There are two alleles Av\j from 
unknown eoniribuiors, but no alleles from people declared not to be 
ctMUr ibuiors. so ~ fiy - 0. 

Liudor H,i, there arc no alleles from known contributors :irc T ~ 
7"., = (b and /i/ = 0. /j /= 0, / = 0. Tlie alleles from unknown con- 
u ibuu^rs are constrained to be U = A \AzAyA^ and x = 2, /• = 0. The 
alleles frtMii people declatcd not to be contributors arc V - AxAi 
and /j\= \J}y= ! . 

Tlic required proh:ihi lilies arc 



^ 2M!(1 - e)>ip2P^P4[(l " 9)pi -f- 9][(1 - 6)/?. + 0] 

minnd - e)(i + exi + 20x1 + 36X1 + 40) 



and the likelihood ratio is 



LR 



(1 + 30)(1 + 46) 



i2((l -e)p, -f 611(1 -6)^2 + 01 



2'2!(1 - 0)V,/>>/J,/>4 
l!ll(l - OKI + 0)(1 -f 20) 



Case 5: Three-AJIele Mixture, Heterozygous SuspOiCU 
and One Unknown 

The suspect is of type A1A2, and the crime sample of type 
A |A:;A3. The two propositions are 

Hp\ The suspect and one unknown person contributed to 
the stain. 

H^: Two unknown people contributed to the stain. 

The evidence sample is C = (AiA.A.A-^A:^), so C, - (AjA.Aj) and 
r = 3. 

Under H,,, the alleles from known contributors are = A, A; 
and 737- = \,h-f=^ U / = 2. The alleles from unknown contributors 
are constrained to include A3 and may also include A 1, A2 or /i3. 
There arc no alleles from people declared not to be contributors, sc 

/I V/ = hy — 0. 

Under H,,, there are no alleles from kjiown contributors, so nr = 
0, /7 7- = 0, 1 = 0. The alleles from the unknown contributor are con- 
straii|ed to include A,, and X = I,r= I . The unknown contributor 
may also carry alleles A A: or A3. The alleles from the person de- 
clared not to be a contributor arc V ^ A lA^ and nv ^ IJiy =^ 1 . 

The required probabilities are 

^f/- PdlA,A:\, M3A:K<)0 

_ 2'2!(l - Q)Vip,p3|(l - 0)/>i 4- 6] 
lil!(l - 0X1 ^- 0)C1 -4' 20) 

2'2!(J - 0)'p,p:p,l(} - Q);., -i- 0] 
1!1!(1 - 0X1 + OXi ^ 20) 

?'2!(1 ~ 0)-V,p.p3l(l " 0);^ 01 
2!(1 - OXI + 0X1 -t- 20) 



2'4'(I - 0)-V./>2P3((i - G)/>, 4 01 

>■ id - 0)ri + 20j(( I - f}}r: -i- o) 

2!i:i!(l - 0)(1 -f 0)(1 20Ki +30)0 40) 

2'4!(1 - 0)>,/j I - B)/^, -V 0] 

X 1(1 - fl)p. 4 20||(1 - 0)p: -h 201 

!!2!1!(1 - 0)(i + 0](i + 26X1 + 30)(1 4 4G) 

2M:{t - ())V,/>:/m|(1 - 0)p, 4 Oj 

in!2f(l - 0X1 + OKI + 20)(l 4 30)(l + 40) 
and the likelihood ratio is 



(1 + .Villi -t 4t))t(l - 0)f2/j( -i- 2i>: /)-,) + 501 
121(1 - 0)p, + Olid ~ 0)/?: 4- 01 



X 1(1 - 0)(/;, -r p. 4- -f 50| 



The suspects are of type A t/tj and A^Aa, and the crime sample is 
of type AjAj/isA^. The two propositions may be 

Hp. The two suspects contributed to the stain. 
//,/: Two unknown people contributed to the stain. 

The evidence sample is C = Cg = (A ^AjA^A^) and c = 4. 

Under H,,, the alleles from biown contributors are T - ~ 
A)A'2A:iA^ and/ir = 2, /jr = 2, / = 4. There are no alleles from un- 
known contributors or from people declared not to be contributors, 
so nv — hx/^ 0. 

Under there are no alleJes from Icnown contributors. T = T^, 
- 4) and nr = hj - I = 0. The alJeles froni unknown contributors 
are constrained to be i/ = A\A2AyA4 and a= 2, r = 0. The alleles 
from people declared not to be contributors are V = AiA2AyA4 and 
fiv = 2, /] V = 2. 



P ovi I . 3 ^(.v, H'; ^, ^26) 
/^y- Pii^. lAtA^A^A^K |A,A2A3A4)) = G 

where 

2M.'(i - e)>,/;».^3/?4(i - e)/7, + e]((i - e);?. + e] 
Q ^ x[n - e)/>, -i- Gild - e)/^ 4 + o) 

^ - G)(l + eKi + 26l~ 

X (1 + 3e)(l + 4e)(I -f 5G')(i -f 66) 

and the likelihood ratio is 

^ (1 ^ 36X1 -f- 46X1 + 56X1 6e] 

24[(i - %)p, 4 ei[(i ~ e)p, + ei 



